SLING-13021 concurrent import of packages#180
Open
joerghoh wants to merge 10 commits into
Open
Conversation
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Ability to concurrently import packages.
The most important changes are in the BookKeeper, where offsets are only stored, if no other message with a lower offset is being processed at the moment. An offset is only persisted, when all "older" messages have already been processed.
If the concurrency is set to "1" (= serialized), the import semantic does not change at all, in this case any older message was already processed, and therefor the offset of every package will be stored.
if the concurrency is higher (true parallel import), it can be that the processing of some messages is not persisted, as "older" messages (messages with a smaller offset) are still being processed. In such a case the import semantic changes from "successfully imported exactly once " to "successfully imported at least once"; this only works under the assumption, that every message is idempotent, and re-importing it (in the correct order) is possible without side effects.
In the context of distribution, this also means, that there must not be any dependency between "adjacent packages" (that means between packages which might be imported concurrently). For example if there are 2 messages sent:
and these packages are submitted in close proximity (time-wise) and happen to be imported concurrently, the result is not guaranteed. For that reason concurrent import must only be enabled if this guarantee is given from outside.
For the reviewers:
BookKeeper.importPackage()andBookKepper.invalidatePackage()in this particular way; skipping a package is always idempotent, and it will be ignored when it comes to the decision, if an offset should be stored or not.